模糊文物可以严重降低图像的视觉质量,并且已经提出了许多用于特定场景的脱模方法。然而,在大多数现实世界的图像中,模糊是由不同因素引起的,例如运动和散焦。在本文中,我们解决了不同的去纹身方法如何在一般类型的模糊上进行。对于深入的性能评估,我们构建一个名为(MC-Blur)的新型大规模的多个原因图像去孔数据集,包括现实世界和合成模糊图像,具有模糊的混合因素。采用不同的技术收集所提出的MC-Blur数据集中的图像:卷积超高清(UHD)具有大核的锐利图像,平均由1000 FPS高速摄像头捕获的清晰图像,向图像添加Defocus,而且真实-world模糊的图像由各种相机型号捕获。这些结果概述了当前的去纹理方法的优缺点。此外,我们提出了一种新的基线模型,适应多种模糊的原因。通过包括对不同程度的特征的不同重量,所提出的网络导出更强大的特征,重量分配给更重要的水平,从而增强了特征表示。新数据集上的广泛实验结果展示了多原因模糊情景所提出的模型的有效性。
translated by 谷歌翻译
当使用星座协同作用来对较大的区域进行侦察成像时,需要以最少的观察资源消耗来达到覆盖能力要求,以获得最佳的星座观测方案。本文以最少数量的卫星数量作为优化目标满足实时地面覆盖要求,提出了对卫星星座配置的优化设计,以通过使用改进的模拟退火算法结合实际覆盖大型区域成像。 - 六边形离散化的时间覆盖评估方法。该算法可以适应实验条件,具有良好的效率,并且可以满足工业准确性要求。在模拟应用中测试了算法的有效性和适应性。
translated by 谷歌翻译
对于黑盒攻击,替代模型和受害者模型之间的差距通常很大,这表现为弱攻击性能。通过观察到,可以通过同时攻击多样的模型来提高对抗性示例的可传递性,并提出模型增强方法,这些模型通过使用转换图像模拟不同的模型。但是,空间域的现有转换不会转化为显着多样化的增强模型。为了解决这个问题,我们提出了一种新型的频谱模拟攻击,以针对正常训练和防御模型制作更容易转移的对抗性例子。具体而言,我们将频谱转换应用于输入,从而在频域中执行模型增强。从理论上讲,我们证明了从频域中得出的转换导致不同的频谱显着图,这是我们提出的指标,以反映替代模型的多样性。值得注意的是,我们的方法通常可以与现有攻击结合使用。 Imagenet数据集的广泛实验证明了我们方法的有效性,\ textit {e.g。},攻击了九个最先进的防御模型,其平均成功率为\ textbf {95.4 \%}。我们的代码可在\ url {https://github.com/yuyang-long/ssa}中获得。
translated by 谷歌翻译
我们专注于在不同情况下在车道检测中桥接域差异,以大大降低自动驾驶的额外注释和重新训练成本。关键因素阻碍了跨域车道检测的性能改善,即常规方法仅着眼于像素损失,同时忽略了泳道的形状和位置验证阶段。为了解决该问题,我们提出了多级域Adaptation(MLDA)框架,这是一种在三个互补语义级别的像素,实例和类别的互补语义级别处理跨域车道检测的新观点。具体而言,在像素级别上,我们建议在自我训练中应用跨级置信度限制,以应对车道和背景的不平衡置信分布。在实例层面上,我们超越像素,将分段车道视为实例,并通过三胞胎学习促进目标域中的判别特征,这有效地重建了车道的语义环境,并有助于减轻特征混乱。在类别级别,我们提出了一个自适应域间嵌入模块,以在自适应过程中利用泳道的先验位置。在两个具有挑战性的数据集(即Tusimple和Culane)中,我们的方法将车道检测性能提高了很大的利润率,与先进的领域适应算法相比,精度分别提高了8.8%和F1级的7.4%。
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译